Natural language processing for similar languages, varieties, and dialects: A survey

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Resource for Natural Language Processing of Swiss German Dialects

Since there are only a few resources for Swiss German dialects, we compiled a corpus of 115,000 tokens, manually annotated with PoStags. The goal is to provide a basic data set for developing NLP applications for Swiss German. We extended the original corpus and improved its annotation consistency. Furthermore, we trained dialect-specific PoS-tagging models and implemented a baseline system for...

متن کامل

Discrimination between Similar Languages, Varieties and Dialects using CNN- and LSTM-based Deep Neural Networks

In this paper, we describe a system (CGLI) for discriminating similar languages, varieties and dialects using convolutional neural networks (CNNs) and long short-term memory (LSTM) neural networks. We have participated in the Arabic dialect identification sub-task of DSL 2016 shared task for distinguishing different Arabic language texts under closed submission track. Our proposed approach is l...

متن کامل

Natural Language Processing - A Survey

1 "Computer, would you search the Web for references to our company and determine the 3 most common complaints mentioned about us?" "Sure, Kevin." "Then summarize and format your findings and insert it into the second page of my slide presentation." "No problem." "And also, could you give me a reminder notice a half hour before my plane is supposed to leave?"

متن کامل

Survey: Natural Language Parsing For Indian Languages

Syntactic parsing is a necessary task which is required for NLP applications including machine translation. It is a challenging task to develop a qualitative parser for morphological rich and agglutinative languages. Syntactic analysis is used to understand the grammatical structure of a natural language sentence. It outputs all the grammatical information of each word and its constituent. Also...

متن کامل

Twitter Language Identification Of Similar Languages And Dialects Without Ground Truth

We present a new method to bootstrap filter Twitter language ID labels in our dataset for automatic language identification (LID). Our method combines geolocation, original Twitter LID labels, and Amazon Mechanical Turk to resolve missing and unreliable labels. We are the first to compare LID classification performance using the MIRA algorithm and langid.py. We show classifier performance on di...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Natural Language Engineering

سال: 2020

ISSN: 1351-3249,1469-8110

DOI: 10.1017/s1351324920000492